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Bayesian Analysis of Many-Pole Fits of Hadron Propagators in Lattice 
QCD 
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We use Bayes' probability theorem to analyze many-pole fits of hadron propagators. An alternative method 
of estimating values and uncertainties of the fit parameters is offered, which has certain advantages over the 
conventional methods. The probability distribution of the parameters of a fit is calculated. The relative probability 
of various models is calculated. 



1. Introduction 

A common procedure in Lattice QCD is to cal- 
culate a correlation function in a certain channel 
and then fit it as a sum of several exponentials 0- 
||. The parameters of the fits are estimated by 
minimizing \ 2 ■ To find the errors, ideally the cal- 
culations should be repeated many times, but this 
is impractical. Usually the jacknife or the boot- 
strap is performed, fnstead here we use Bayes's 
theorem to derive the parameters' probability dis- 
tribution for given data from the probability of 
the data for given parameters. Usually one de- 
termines the number of poles by comparing x 2 
for different models. It is very often ambiguous. 
Here we use the parameters probability distribu- 
tion to calculate relative probabilities of the pos- 
sible models with given data. We perform the 
Bayesian analysis for some model functions imi- 
tating Lattice QCD propagators. Then we apply 
this method to analyze SU(2) hadron propaga- 
tors. 

2. Bayes' theorem and its applications 

P(A\B) is the conditional probability that 
proposition A is true, given that proposition B 
is true. Bayes' theorem reads: 



P(T\D,I)P(D\I) = P{D\T,I)P(T\I), 



(1) 



where T is the theoretical model to be tested, 
D is the data, and I is the prior information. 
P(T\D,I) is the posterior probability of the the- 
oretical model. P(T\I) is the prior probability of 



the theoretical model. P(D\I) is the prior proba- 
bility of the data; it will be always absorbed into 
the normalization constant. P(D\T,I) is the di- 
rect probability of the data. For shortness I will 
be implicit in all formulae henceforth. 

We are interested in calculating the posterior 
probability of a theoretical model and its param- 
eters {c, E}: 



P{{c,E}\D) 



P({c,E}) 
P(D) 



P(D\{c,E}\). 



Given P({c,E}), and P(D\{c,E}) we can @: 

I. Calculate the posterior probability density 
P{E n \D) [P(c n \D)] for the parameters E n [c„] : 

P(E n \D) = J Uda.^dEi P({c,E}\D). 

II. Calculate the average values Ei and the 
standard deviations aE t (similarly for Cj) 



E n = J dE n E n P(E n \D), 

dE n (E n -EV t ) 2 P(E n \D), 



(2) 



provided P(E n \D) is normalized. 

III. Compare several models Tf, (for example 
one pole, two pole and three pole models). We 
cannot find the absolute probability of a theory, 
since we do not have the "complete set" of theo- 
ries. But we can calculate the relative probabili- 
ties of two theories: 



P(T t \D) = P(Ti)P(D\Tj) 
P(Tj\D) P{T J )P{D\T j )' 



(3) 



2 



P(D\T) can be obtained from P{D\{c, E}) by in- 
tegrating over all parameters of the theory: 

P{D\T) = J{dc}{dE}P({c,E})P(D\{c,E}). (4) 

For the prior probability P({c, E}) of the pa- 
rameters of a model in section we make the 
"least informative" assumption jfji-^c, E) dc dE 
~ dc/cdE j 'E. This form is scale invariant. Priors 
P({c, E}) should be normalized. 

The direct probability P(D\{c, E}) of the data 
D can be calculated relatively easily if the data is 
Gaussian distributed. We generate "fake" data to 
be used in the analysis. We use an n-pole model, 
add noise e(t) to generate a sample of N "propa- 
gators" g a (t)= E?=i c ie - E ^ + e(t) (a = 1,..,N), 
calculate the average G(t) and estimate the co- 
variance matrix C(t, t') from the data. Here t is 
the discrete, "lattice" time. We vary the number 
of "propagators" to control the noise level in the 
data. Here we use N — 360 and N = 3600, which 
corresponds to a decrease in the noise level by a 
factor of 3. 

The probability distribution of G(t) is jjj 



P(D\{c,E}) = 



- p -X 2 ({c,E})/2 



(5) 



where \ 2 is calculated using the full covariance 
matrix {[j. The individual g a (i) need not be 
Gaussian distributed, as long as we average over 
enough of them so that G(t) are. Gaussian dis- 
tribution of the "fake" data is ensured. 

3. Estimating the Parameters. 

3.1. 1 pole data. 

We generate data D for the one pole model with 
cf = 0.15 and E\ n = 0.485. We use the one pole 
model to fit the data. Here we assume the prior 
probability of the data P({c, E}) to be constant. 
Then the posterior probability of the parameters 
P({c, E}\D) is up to a constant equal to the direct 
probability of the data given by equation (||). The 
posterior probability density for E\ 



P{E X \D) 



Jdc 



i e 



-X 2 (ci,Bi)/2 



/ dcidE 



l e 



-X 2 (c 1 ,E 1 )/2- 



(6) 



It begs for the Monte Carlo integration with the 
Metropolis algorithm. 



We generate a set of points (a, Ex). Every 
point is characterized by x 2 (ci,i?i)- We sample 
the vicinity of the minimum of x 2 ( c i; Ei) [maxi- 
mum of exp{— x 2 (ci, ^i))](Fig.l). 
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Figure 1. \ 2 vs - the point number n. 

We make a scatter plot c% vs.Ei (Fig. 2) to visu- 
alize this distribution. The density of the points 
is proportional to the weight exp(— x 2 (ci, -Ei))- 
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Figure 2. Scatter plot of % 2 in (c, E) space. 

Taking integrals (^|) is equivalent to making a 
histogram with steps big enough to make the dis- 
tribution smooth (Fig. 3). 
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Figure 3. Probability distribution P(E). 



Both equation (||) and jacknife give the same 
values for the parameters and errors of E = 



3 



0.4851(1) and c = 0.1499(2). The present ap- 
proach conserves computational time compared 
to the jacknife. If one uses simulated annealing 
to fit the data, one covers the same regions in the 
(ci,Ei) space as needed to calculate probability 
distributions (||). With the probability distribu- 
tions one immediately obtains the parameters and 
errors, whereas with the jacknife the fitting has 
to be repeated N times. 

3.2. 2 Pole data 

We repeat the analysis performed in section 3.1 
for two-pole data when the poles are well sepa- 
rated: cf =0.1, 4 n =0.1 E{ n =0.5 E| n =0.6. We 
use the two-pole model to fit the data. The prob- 
ability distributions for the energies and coeffi- 
cients are obtained and the parameters are esti- 
mated just as in section 3.1. 

The only complication is that now we have 
to deal with the 4-d space of parameters 
(ci,Ei,C2,E2). We perform a Monte Carlo inte- 
gration as described above. The 4-d probability 
distribution is visualized by projecting it onto two 
planes (ci,E\) and {ci,Ei)- Each blob in Fig. 4 
is a projection of the 4-d distribution on a 2-d 
plane. Each blob represents the probability dis- 
tribution for one pair of (ci,Ei) after the second 
pair has been integrated out. 
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Figure 4. Scatter plot of \ 2 projected on (ci, E\) 
and (02, E2) planes and combined on one plane. 

4. Model selection 

Unless one has some prior knowledge, deter- 
mining the number of poles present in the data 
based on comparing x 2 for models with different 
numbers of poles is very often ambiguous. 

Table [D contains the values of \ 2 obtained when 



1 and 2 pole data are fit with 1 and 2 pole models. 
We use the one pole data from section 3.1 and 
generate 2 pole data with the poles close to each 
other: cf = 0.05 c| n = 0.1 E{ n = 0.49 Ef = 0.5. 
With one exception the corresponding values of 
\ 2 are close and do not allow for a definite answer. 



data 


# of propagators 


model 


X'/dof 






1 pole 


0.61 




360 


2 pole 


0.57 


1 pole 




1 pole 


1.2 




3600 


2 pole 


1.3 






1 pole 


0.87 




360 


2 pole 


0.99 


2 pole 




1 pole 


2.0 




3600 


2 pole 


1.2 



Table 1 



Values of x 2 f° r fitting 1 and 2 pole data of dif- 
ferent noise levels with 1 and 2 pole models. 



We need to calculate the total probabilities ra- 
tio (||) to determine the number of poles in the 
data. We assume P(2pole) — P(lpole), i.e. a 
priori these two models are equally probable. To 
calculate P{D\\pole) we substitute (||) into equa- 
tion (^) (and similarly for P(D\2pole)) 

r e -x 2 (ci,i?i) 
P(D\lpole) = / cfcidEi (7) 

The integration is tricky since we are dealing with 
a function that varies rapidly in the multidimen- 
sional space. We use scatter plots to determine 
the areas of integration. Table contains in- 
tegration results for the total probability ratio 
R = p(2polejgj • With a sufficiently low noise 
level the total probabilities ratio picks the cor- 
rect model. If we estimate the parameters of the 
2 pole data with 3600 propagators using equation 
(||) , we get the input parameters back within the 
error bars. The jacknife here gives unreasonably 
large errors because x 2 nas several minima in the 
(ci, C2, Ei, E2) space. It can be seen from the 
graph of x 2 vs. n (Fig. 7), and they can be also 
clearly identified on the scatter plot (Fig. 8). Two 



4 



data 


# of propagators 


R 




360 


3 


1 pole 


3600 


30 




360 


3 


2 pole 


3600 


0.02 



Table 2 

Total probabilities ratio R. 

minima ( 1 and 3) have the same x 2 . When we 
perform the jacknife, the blind fitting finds either 
minimum 1 or 3, which results in the unreason- 
ably large error estimates. 




Figure 7. \ 2 nas multiple minima. 




0.54 



Figure 8. Three minima of y 2 from Fig. 7 pro- 
jected on the two-dimensional plane (c,E). 



5. Analysis of SU(2) data 

Here we analyze hadron propagators in the 
pseudoscalar channel. The detailed description 
of these data is given in ||,|5| . The coupling con- 
stant (3 — 2.5, lattice spacing a = 0.09 ± 0.012 
fm, the lattice size is 12 3 x 24. The propagators 
analyzed here were calculated with k — 0.146 for 
360 configurations. The point source is at t = 5. 
The fit is performed in the time range 6-20. 



We repeat the analysis for 60 and 360 config- 
urations to study the data with different noise 
levels. For 60 configurations the x 2 fdof is 1-0 for 
the 3 pole fit and 1.17 for the 4 pole fit, and the 
total probabilities ratio is p|^°^| ~ 1. For 360 
configurations the x 2 /dof is 0.52 for the 3 pole 
model fit and 0.69 for the 4 pole model fit, and 
the total probabilities ratio for 360 configurations 
is p(4p°je|^] ~ 10. Here again, for low noise data 
we are able to choose between two models based 
on a qualitative estimate given by the total prob- 
abilities ratio. 

6. Conclusion 

A new method has been introduced that can be 
used to analyze many-pole fits of hadron propa- 
gators in Lattice QCD. It has been used to esti- 
mate the many-pole model parameters and their 
uncertainties. It works in the presence of multi- 
ple minima when the jacknife (at least in its sim- 
pleminded form) fails. It cuts the computational 
time. The new method has been used to calculate 
relative probabilities for different models, which 
can be crucial in making the optimal choice of a 
model. The scatter plots, which have been in- 
troduced as an auxiliary tool for the multidimen- 
sional integration, can be used as an independent 
tool for the many pole fit analysis. 

I wish to thank Greg Kilcup for his suggestion 
to use Bayesian analysis. 
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